A Robust Logical and 
Computational Characterisation of 
Peer-to-Peer Database Systems 



Enrico Franconi^, Gabriel Kuper^, Andrei Lopatenko^^, and Luciano Serafini'' 

^ Free University of Bozen-Bolzano, Faculty of Computer Science, Italy, 
f ranconiSinf . unibz . it , alopatenkoOunibz . it 
^ University of Trento, DIT, Italy, kuper@acm.org 
^ University of Manchester, Department of Computer Science, UK 
■* ITC-irst Trento, Italy, serafini@itc.it 



Abstract. In this paper we give a robust logical and computational 
characterisation of peer-to-peer (p2p) database systems. We first define a 
precise model-theoretic semantics of a p2p system, which allows for local 
inconsistency handling. We then characterise the general computational 
properties for the problem of answering queries to such a p2p system. 
Finally, we devise tight complexity bounds and distributed procedures 
for the problem of answering queries in few relevant special cases. 



1 Introduction 

The first question we have to answer when working on a logical characterisation 
of p2p database systems is the following: what is a p2p database system in the 
logical sense? In general, it is possible to say that a p2p database system is an 
integration system, composed by a set of (distributed) databases interconnected 
by means of some sort of logically interpreted mappings. However, we also want 
to distinguish p2p systems from standard classical logic-based integration sys- 
tems, as for example described in [?]. As a matter of fact, a p2p database system 
should be understood as a collection of independent nodes where the directed 
mappings between nodes have the only role to define how data migrates from 
a set of source nodes to a target node. This idea has been already clearly for- 
mulated in [?], where a framework based on KFOL is informally proposed as a 
possible solution. 

Consider the following example. Suppose we have three distributed databases. 
The first one {DBi) is the municipality's internal database, which has a table 
Citizen-1. The second one {DB2) is a public database, obtained from the mu- 
nicipality's database, with two tables Male-2 and Female-2. The third database 
(DBs) is the Pension Agency database, obtained from a public database, with 
the table Citizen-3. The three databases are interconnected by means of the 
following rules: 

1 : Citizen-l(a;) ^ 2 : (Male-2(x) V Feinale-2(a;)) 
(this rule connects DBi with DB2) 



2 : Male-2(x) 3 : Citizen-3(x) 
2 : Female-2(a;) =^ 3 : Citizeii-3(x) 

(these rules connect DB2 with DB^) 

In the classical logical model, the Citizen-3 table in DB^ should be filled with 
all of the individuals in the Citizen- 1 table in DBi, since the following rule is 
logically implied: 

1 : Citizen-l(a;) 3 : Citizen-3(a:) 

However, in a p2p system this is not a desirable conclusion. In fact, rules should 
be interpreted only for fetching data, and not for logical computation. In this 
example, the tables Female-2 and Male-2 in DB2 will be empty, since the data 
is fetched from DBi, where the gender of any specific entry in Citizen-1 is not 
known. From the perspective of -DS2, the only thing that is known is that each 
citizen is in the view (Female-2 V Male-2). Therefore, when DB^ asks for data 
from DB2, the result will be empty. 
In other words, the rules 

2 : Male-2(x) => 3 : Citizen-3(a;) 

2 : Female-2(a;) ^ 3 : Citizen-3(a;) 

will transfer no data from DB2 to -DS3, since no individual is known in DB2 to 
be either definitely a male (in which case the first rule would apply) or definitely 
a female (in which case the second rule would apply). We only know that any 
citizen in DBi is either male or female in DB2, and no reasoning about the rules 
should be allowed. 

We shall give a robust logical and computational characterisation of p2p 
database systems, based on the principle sketched above. We say that our for- 
malisation is robust since, unlike other formalisations, it allows for local inconsis- 
tencies in some node of the p2p network: if some database is inconsistent it will 
not result in the entire database being inconsistent. Furthermore, we propose 
a polynomial-time algorithm for query answering over realistic p2p networks, 
which does not have to be aware of the network structure, which can therefore 
change dynamically. 

Our work has been influenced by the semantic definitions of [?] , which itself 
is based on the work of [?]. [?] defined the Local Relational Model (LRM) to 
formalise p2p systems. In LRM all nodes are assumed to be relational databases 
and the interaction between them is described by coordination rules and trans- 
lation rules between data items. Coordination rules may have an arbitrary form 
and allow to express constraints between nodes. The model-theoretic semantics 
of coordination rules in [?,?] is non-classical, and it is very close to the local 
semantics introduced in this paper. 

Various other problems of data management focusing on p2p systems have 
been considered in the literature with classical logic-based solutions. Wc mention 
here only few of them. In [?] , query answering for relational database- based p2p 
systems under classical semantics is considered. The case when both GAV and 
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LAV style mappings between peers are allowed is considered. The mapping be- 
tween data sources is given in the WC language allowing for both inclusion and 
equality of conjunctive queries over data sources and definitional mappings (that 
is, inclusions of positive queries for a relation), and queries have certain answer 
semantics. It is proved that in the general case query answering is undecidable 
and in the acyclic case with only inclusion mappings allowed, the complexity 
of query answering becomes polynomial (if equality peer mappings are allowed, 
subject to some restrictions, query answering then becomes co-NP-complete) . 
An algorithm reformulating a query to a given node into queries to nodes con- 
taining data is provided. In [?] mapping tables (similar to translation rules of 
[?]) are considered. In the article mapping tables under different semantic are 
considered, as well as constraints on mappings and reasoning over tables and con- 
straints under such conditions. Moreover, see [?] for the data placement problem, 
[?] for data trading in data replication, [?] for the relationship between p2p and 
Semantic Web, and in general [?] for the best survey of classical logic-based data 
integration systems. 

This paper is organised as follows. At the beginning, the formal framework 
is introduced; three equivalent ways of defining the semantics of a p2p system 
will be given, together with a fourth one - the extended local semantics - which 
is able to handle inconsistency and will be adopted in the rest of the paper. 
General computational properties will be analysed in Section |3| together with 
the special case of p2p systems with the minimal model property. Tight data 
and node complexity bounds for query answering are devised for the Datalog-p2p 
systems and for the acyclic p2p systems. 

2 The Basic Framework 

We first define the nodes of our p2p network as general first order logic (FOL) 
theories sharing a common set of constants. Thus, a node can be seen as repre- 
sented by the set of models of the FOL theory. 

Definition 1 (Local database) Let I he a nonempty finite set of indexes {1, 2, 
. . . , n}, and C he a set of constants. For each pair of distinct i, j G /, let Li be 
a first order function-free language with signature disjoint from Lj but for the 
shared constants C. A local database DBi is a theory on the first order language 
Li. 

Nodes are interconnected by means of coordination rules. A coordination rule 
allows a node i to fetch data from its neighbour nodes ji, . . . , jm- 

Definition 2 (Coordination rule) A coordination rule is an expression of the 
form 

ji : 6i(xi,yi) A - ■■ Ajk ■■ 5fe(xfe,yfc) i : /i(x) 

ji, jk, i are distinct indices, and each 6;(x;,y;) is a formula of Lj^, and 
/i(x) is a formula of Li, and x = xi U ■ • • U Xfc. 
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Please note that we are making the simphfying assumption that the equal 
constants mentioned in the various nodes are actually referring to equal objects, 
i.e., they are playing the role of URIs (Uniform Resource Identifiers). Other 
approaches consider domain relations to map objects between different nodes [?]. 
We will consider this extension in our future work. 

A p2p system is just the collection of nodes interconnected by the rules. 

Definition 3 (p2p system) A peer-to-peer (p2p) system is a tuple of the form 
MDB = {LDB, CR), where LDB = {DBi,- ■ ■ , DBn} is the set of local databases, 
and CR is the set of coordination rules. 

A user accesses the information hold by a p2p system by formulating a query 
to a specific node. 

Definition 4 (Query) A local query is a first order formula in the language 
of one of the databases DBi ■ 

2.1 Global Semantics 

In this section we formally introduce the meaning of a p2p system. We say that 
a global model of a p2p system is a FOL interpretation over the union of the 
FOL languages satisfying both the FOL theories local to each node and the co- 
ordination rules. Here it is crucial the fact that the semantics of the coordination 
rule is not the expected standard universal material implication, as in the classi- 
cal information integration approaches. The p2p semantics for the coordination 
rules states that if the body of a rule is true in any possible model of the source 
nodes then the head of the rule is true in any possible model of the target node. 
This different notion from classical first order logic is exactly what we need: in 
fact, only information which is true in the source node is propagated forward. 

Definition 5 (Global semantics) Let A be a non empty set of objects in- 
cluding C (see Definitions^, and let MDB = {LDB, CR) be a p2p system. An 
interpretation of MDB over A is a n-tuple m = {mi, m2, ■ . . rn„) where each rUi 
is a classical first order logic interpretation of Li on the domain A that interprets 
constants as themselves. 

We adopt the convention that, if m is an interpretation, then rUi denotes the i*^ 
element of m. 

A (global) model M for MDB - written M ^global MDB - is a nonempty set 
of interpretations such that: 

1. the model locally satisfies the conditions of each database, i.e., 

Vm G M. (m, h DB,) 

2. and the model satisfies the coordination rules as well, i.e., for any coordina- 
tion rule 

ji : &i(xi,yi) A • • • A jfc : 6fc(xfc,yfc) ^ i : /i(x) 
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then for every assignment a - assigning the variables x to elements in A, 
which is common to all models - the following holds: 

(Vm e M.{mj, \= 3y.6i(xi,y)) A • • • A {nij, h 3y.6fe(xft, y))) ^ 
(Vm e M. {rrii \= /i(x))) 

The answer to a query in a node of the system is nothing else than the tuples 
of values that, substituted to the variables of the query, make the query true in 
each global model restricted to the node itself. 

Definition 6 (Query answer) Let (3i(x) be a local query with free variables 
X. The answer set of Qi is the set of substitutions of x with constants c, such 
that any model M of MDB satisfies the query, i.e., 

{c e C X • • ■ X C I VM. (M hgiobai MDB) ^\/m€M. (m, h <3^(c))} 

This corresponds to the definition of certain answer in the information inte- 
gration literature. 

2.2 Local Semantics 

The semantics we have introduced in the previous section is called global since 

it introduces the notion of a global model which spans over the languages of 
all the nodes. In this section we introduce the notion of local semantics, where 
actually models of a p2p system have a node-centric nature which better reflects 
the required characteristics. We will prove at the end of the Section that the two 
semantics are equivalent. 

Definition 7 The derived local model Mj is the union of the i*'' components 
of all the models of MDB: 

Mi= [J mi 

m e M, 
M ^global MDB 

Lemma 1 The answer set of a local query Qi{x) coincides with the following: 
{c e C X • • • X C I Vm, G Mi. {mi \= Qi{c))} 

The above lemma suggests that we could consider somehow (^Mi, . . . , mJ^ 

as a model for the p2p system. This alternative semantics, which we call local 
semantics as opposed to the global semantics defined in the previous section, is 
defined in the following. The notation will sometimes coincide with the one used 
in the definition of global semantics; its meaning will be clear from the context. 

Definition 8 (Local semantics) A (local) model M for MDB - written M \= 
MDB - is a sequence (Mi, . . . , M„) such that: 
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1. each Mi is a non empty set of interpretations of Li over A 

2. Wmi G Mi. (m, \= DB,) 

3. for any coordination rule 

ji ■■ 6i(xi,yi) A • •• A jfc : 6fc(xfc,yfc) i : /i(x) 

then for each assignment a to the variables x the following holds: 

{yrrij^ e Mj^.{mj^ \= 3y.6i(xi,y))) A • • • A 
{Vmj, e Mj^.{mj, h 3y.6fe(xfc,y)))) ^ 
(Vmj G Mi. {mi \= /i(x)) 

Definition 9 (Query answer for local semantics) Let Qi be a local query. 
The answer for Qi is the set of substitutions o/x with constants c such that any 
model M of MDB locally satisfies the query, i.e.: 

{c G C X • • • X C I VM. (M h MDB) ^ Vm, G M,. {m, \= Qi{c))} 

Theorem 2 The answer sets of a local query Qi in the global semantics and in 

the local sernMutics coincide. 

A way to understand the diflFerence between global and local semantics would 
be the following. If 

M = {{ml, ...,ml,..., m\) (m{, . . . , m^, . . . ,m^^ , . . .} 

is a model for a p2p system in the global semantics, then also 

M' = {(ml, . . . ,ml, . . . ,mi^ , . . . , (m{, . . . ,ml , . . . ,mi^ , . . .} 

is a model in the global semantics. In other words, there is no formula express- 
ible in the p2p system which distinguishes two models in the global semantics 
obtained by swapping local models. This is the reason why we can move to the 
local semantics defined in this section without loss of meaning. In fact, the local 
semantics itself does not distinguish between the two above cases, and can be 
therefore considered closer to the intended meaning of the p2p system. 

2.3 Autoepistemic Semantics 

In this section we briefly introduce a third approach to define the semantics of a 
p2p system, as suggested in [?]. This approach can be proved equivalent to the 
global semantics introduced at the beginning - and therefore equivalent to the 
local semantics as well. 

Let us consider KFOL, i.e., the autoepistemic extension of FOL (see, e.g., [?]). 
The previous definition of global semantics can be easily changed to fit in a KFOL 
framework, so that the p2p system would be expressed in a single KFOL theory 
S. Each Di would be expressed into KFOL without any change, i.e., without 
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using at all the K operator; the coordination rules would be translated into 
formulas in U as 

Vx.K3y.6(x,y) ^ K/i(x). 

It can be easily proved that the answer set as defined above (Definition in the 
global semantics framework is equivalent to the answer set defined in KFOL as 
the set of all constants c such that 

S KQ,;(c) . 

2.4 Extended Local Semantics to Handle Inconsistency 

The semantics defined above does not formalise local inconsistency. In fact as 
soon as a local database becomes inconsistent, or a coordination rule pushes 
inconsistency somewhere, both the global and the local semantics say that no 
model of MDB exists. This means that local inconsistency implies global incon- 
sistency, and the p2p system is not robust. 

Proposition 3 For any p2p system such that there is an i such that DBi is 
inconsistent, then the answer set of any query Qj{'x.) is equal to C x ■ ■ ■ x C , for 
both the global and local semantics. 

In order to have a robust p2p system able to be meaningful even in presence 
of some inconsistent node, we extend the local semantics by allowing single Mi 
to be the empty set. This captures the inconsistency of a local database: we 
say that a local database DBi is inconsistent if Mi is empty for any model of 
the p2p system. A database depending on an inconsistent one through some 
coordination rule will have each dependent view - i.e., the formula in the head 
of the rules with n free variables - equivalent to A", and the databases not 
depending on the inconsistent one will remain consistent. Therefore, in presence 
of local inconsistency the global p2p system remains consistent. 

The following example will clarify the difference between the local semantics 
and the extended local semantics in handling inconsistency. 

Example 1. Consider the p2p system composed of a node DBi containing a 
unary predicate P and an inconsistent axiom _L, and another node DB2 con- 
taining two unary predicates Q and R with no specific axiom on them. Let 

1 ; P{x) =^ 2 : Q{x) 

be a coordination rule from DBi to DB2. Even though DBi is inconsistent, 
there is a model M — (Mi, M2) where M2 is not the empty set. The answer set 
of the query Q{x) in 2 is the whole set of constants known to the p2p system. 
Furthermore, the answer set of the query R{x) in 2 is the empty set. So, in this 
case the inconsistency does not have an effect through the coordination rule to 
each predicate of DB2 ■ 

Let us suppose now that M2 contains in addition the axiom 3x^Q{x). Then, 
the only model (in the local semantics) is {Mi, M2) where both Mi and M2 are 
the empty set. 



7 



In the case of fully consistent p2p systems, the local semantics and the ex- 
tended local semantics coincide. In the case of some local inconsistency, the local 
(or, equivalently, the global) semantics will imply a globally inconsistent system, 
while the extended local semantics is able to still give meaningful answers. 

Theorem 4 // there is a model for MDB with the local (or global, or autoepis- 
temic) semantics then for each query the answer set with the local (or global, 
or autoepistemic) semantics coincide with the answer set with extended local 
semantics. 

3 Computing Answers 

In this section, we will consider the global properties of a generic p2p system: we 
will try to find the conditions under which a computable solution to the query 

answering problem exists, we will investigate its properties and how to compute 
it in some logical database language. From now on, we assume the extended local 
semantics - i.e., the semantics of the p2p system able to cope with inconsistency. 
We include the sketches of some proofs. 

Let us define the inclusion relation between models of a p2p system. A model 
M is included into N {M C N) if for each node i, a set of models of i in M is a 
subset of a set of models for i in N . 

Let CR be a set of coordination rules and M an interpretation of MDB, i.e., 
a sequence {Mi, . . . , M„) such that each Mi is a set of interpretations of Li over 
A. A groimd formula ^ is a derived fact for M and CR if either M \= A, ox 
i : tl) ^ j : A \s a,n instantiation of a rule in CR and M \= Please remember 
that when we write M ^ -0 - where M is a model for MDB- we intend the 
logical implication for the extended local semantics. 

Definition 10 (Immediate consequence operator) Let MDB be a p2p sys- 
tem, CR a set of coordination rules, and M a model of MDB. A model M is an 
immediate consequence for M and CR if it is a maximal model included into M 

such that each Mi £ M contains facts derived by CR from M . The immediate 
consequence operator for MDB, denoted Tmdb, is the mapping from a set of 
models into a set of models such that for each M, Tmdb{M) is an immediate 
consequence of M. 

Few lemmas about the properties of the consequence operator are in order 
to prove our main theorem. 

Lemma 5 The operator Tmdb is monotonic with respect to model inclusion, 
i.e., ifM C N, then Tmbd{M) C Tmdb{N) 

Proof. For each rule create a ground instantiation of it. Each ground instance of 
CR in N is also present in M. This means that for each new formula derivable 

in TV the same formula is derivable in M. So, all models which are refused 
during the application of the operator in N are also refused in M. Therefore, 
Tmdb{M) CTmdb{N). 
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Lemma 6 The operator Tmdb is monotonia with respect to the set of ground 
instantiations of rules satisfied (the set of ground instances of rules derived at 
some step of the execution of an operator remains valid for all the subsequent 
steps). 

Proof. Let's assume that a rule i : ip{x. y) j : 0(x) is instantiated for some x, 
y at step n for the set of models Af" , Mj . Clearly, it will remain valid for any 
step m > n, given the semantics of the rules and that C M", M™ C M". 

Lemma 7 For any initial model M, the operator Tmdb reaches afixpoint which 
is a model of MDB. 

Proof. Since we begin from a finite set of models, after a finite number of steps 
we reach a lower bound (possibly the empty set of models): this is a set of models 
which satisfy MDB. In fact, all local FOL theories are satisfied by definition of 
Tmdb, and if some rule in CR is not satisfied then an execution of Tmdb will 
lead to a new model, but this would contradict the reaching of the fixpoint. If 
the empty set of models is reached then MDB is trivially satisfied. 

The main theorem states that we can use the consequence operator to com- 
pute the answer to a query to a p2p system. 

Theorem 8 The certain answer of a query to a p2p system MDB is the cer- 
tain answer of the query over the model T'^jj^^Mq), where Mq is the model set 
consisting of the Cartesian product of all the interpretations satisfying the local 
FOL theories. 

Proof. If Q{a) is a certain answer, then, since Q{a) is true in any model, it 
is true in the model resulting by applying the operator to the maximum original 
set. So, {x I MDB h 0(x)} C {x | Tmdb{Mo) h Qx} 

=>. Since the original interpretation is the Cartesian product of all local 
interpretations, then any particular model consisting of a set of local models is a 
subset of Mo, i.e., VM.M C Mq. By monotonicity of the operator, it holds that 

yM.T^osiM)CT'^^osiMo) 
Therefore, {x | MDB ^ Q(x)} D {x | Tmdb{Mo) ^ Q(x)}. 

3.1 Computation with Minimal Models 

Let us now assume that at each node the minimal model property holds - i.e., 
in each local database the intersection of all local models is a model itself of the 
local FOL theory, and it is minimal wrt set inclusion. Let us assume also that 
the coordination rules are preserving this property - e.g., the body of any rule 
is a conjunctive query and the head of any rule is a conjunctive query without 
existential variables. We say that in this case the p2p system enjoys the mini- 
mal model property. Then, it is possible to simplify the computation procedure 
defined by the Tmdb operator. In such case the computation is reducible to a 
"migration of facts" . The procedure is crucially simplified if it is impossible to 
get inconsistency in local nodes (like for Datalog or relational databases). 
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Definition 11 (Minimal model property) The consequence operator 
for MDB with the minimal model property is defined in the following way: 



— at the beginning, the minimal model is given for each node; 

— at each step, TJ^^g computes for each coordination rule a set of derived facts 
and adds them into the local nodes; 

— if for a node j an inconsistent theory is derived, then the current model is 
replaced by the empty set, otherwise the current theory is extended with the 
derived facts and the minimal model is replaced by the minimal model of the 
new theory. 

We denote with T^^j^^ the fixpoint of Tamo's ■ 

Theorem 9 // the p2p system has the minimal model property, then for positive 
queries Q(x) 

TMDBiMmin) h Q{^) ^ MDB ^ Q(x) 

Proof. If Mmin is the minimal model, then if ip does not contain negation, 
(VM model of MDB, M \= tp) ^ Mmin \= fp- Let us assume that we execute 
Tmdb{Mo), where Mq is the set of all the models of each node. Assume that at 
step i of the execution of TJI}™g (Mmin ) wc get the minimal model of the outcome 
of step i of the execution of Tmdb{M[)) (which is evidently true for step 0). The 
set of derived facts for each node at step i + 1 for Tmdb will be the same as for 
'^MDB> SO that at step i + 1 the theories for the execution of Tmdb and TJ^]^^ 
will be the same. By definition of T'^'ffg, this will give a minimal model at the 
i + 1 step. If at step n Tmdb reaches a fixpoint, then T^IJ^ reaches a fixpoint 
as well with the minimal model corresponding to the models devised by Tmdb- 
Since Q is a positive query, the thesis is proved. 

This theorem means that a p2p system with nodes and coordination rules 
with the minimal model property collapses to a traditional p2p and data integra- 
tion system like [?,?] based on classical logic. A special case is when each node 
is either a pure relational database or a Datalog-based deductive database (in 
either case the node enjoys the minimal model property), and each rule has the 
body in the form of a conjunctive query and the head in the form of a conjunctive 
query without existential variables. We call such a system a Datalog-p2p sys- 
tem. In such case, it is possible to introduce a simple "global program" to answer 
queries to the p2p system. The global program is a single Datalog program ob- 
tained by taking the union of all local Datalog programs and of the coordination 
rules expressed in Datalog, plus the data at the nodes seen as EDB. 

We arc able to precisely characterise the data and node complexity of query 
answering in a Datalog-p2p system. The data complexity is the complexity of 
evaluating a fixed query in a p2p system with a fixed number of nodes and 
coordination rules over databases of variable size - as input we consider here 
the total size of all the databases. The node complexity, which we believe is a 
relevant complexity measure for a p2p system, is the complexity of evaluating a 
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fixed query over a databases of a fixed size with respect to a variable number of 
nodes in a p2p system witii a fixed number of coordination rules between each 
pair of nodes. It turns out that the worst case node complexity is rather high. 

Theorem 10 (Complexity of Datalog-p2p) The data complexity of query 
answering for positive queries in a Datalog-p2p system is in PTIME, while 
the node complexity of query answering a Datalog-p2p system is EXPTIME- 
complete. 

Proof. The proof is obtained by reducing the problem to a global Datalog pro- 
gram and considering complexity results for Datalog 

It can be shown that the node complexity becomes polynomial under the 
realistic assumption that the number of coordination rules is logarithmic with 
respect to the number of nodes. 

3.2 A Distributed Algorithm for Datalog-p2p Systems 

Clearly, the global Datalog program devised in the previous Section is not the 
way how query answering should be implemented in a p2p system. In fact, the 
global program requires the presence of a central node in the network, which 
knows all the coordination rules and imports all the databases, so that the 
global program can be executed. A p2p system should implement a distributed 
algorithm, so that each node executes locally a part of it in complete autonomy 
and it may delegate to neighbour nodes the execution of subtasks, so that there 
is no need for a centralised authority controlling the process. 

In [?] a distributed algorithm for query answering has been introduced, which 
is sound and complete for an extension of Datalog-p2p systems. In that work, 
a Datalog-p2p system is called a definite deductive multiple database, where 
domain relations translating query results from the different domains of the 
various nodes are also allowed. So, we can fully adopt this procedure in our 
context by assuming identity domain relations. In this paper we do not give the 
details of the distributed algorithm, which can be found in [?,?]. 

3.3 Acyclic p2p Systems 

A p2p system is acyclic if the dependency graph induced by the coordination 
rules is acyclic. The acyclic case is worth considering since the node complexity of 
query answering is greatly reduced - it becomes quadratic - and more expressive 
rules are allowed. 

Theorem 11 (Complexity of acyclic p2p) Answering a conjunctive query 
in an acyclic p2p system with coordination rules having unrestricted conjunctive 
queries both at the head and at the body is in PTIME. If a positive query is 
allowed at the head of a coordination rule then query answering becomes coNP- 
complete. In both cases the node complexity of query answering is quadratic, and 
it becomes linear in the case of the network being a tree. 
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Proof. The proof follows by reducing to the problem of query answering using 
views (see, e.g., [?]). 

This result extends Theorem 3.1 part 2 of [?]. 

A distributed algorithm for an acyclic p2p system would work as follows. 
A node answers to a query first by populating the views defined by the heads 
of the coordination rules of which the node itself is target with the answer to 
the queries in the body of such rules, and then by answering the query using 
such views. Of course, answering to the queries in the body of the rules involve 
recursively the neighbour nodes. 

It is possible to exploit the low node complexity of acyclic systems (which 
have a tree-like topological structure) to build more complex network topologies 
still with a quadratic node complexity for query answering. The idea is to in- 
troduce in an acyclic network the notion of fixed size autonomous subnetworks 
where cyclic rules are allowed, and a super-peer node is in charge to communi- 
cate with the rest of the network. This architecture matches exactly the notion 
of super-peer in real p2p systems like Gnutella. 

4 Conclusions 

In this paper, we propose a new model for the semantics of a p2p database 
system. In contrast to previous approaches our semantics is not based on the 
standard first-order semantics. 

In our opinion, this approach captures more precisely the intended semantics 
of p2p systems. It models a framework in which a node can request data from 
another node, which can involve evaluating a query locally and/or requesting, 
in turn, data from a third node, but can not involve evaluating complex queries 
over the entire network, as would be the case if the network was an integrated 
system as in standard work on data integration. 

One interesting consequence is in the way we handle inconsistency. In a p2p 
system, with many independent nodes, there is a possibility that some nodes will 
contain inconsistent data. In standard approaches, this would result in the whole 
database being inconsistent, an undesirable situation. In our framework, the 
inconsistency will not propagate, and the whole database will remain consistent. 

The results we have presented show that the original, global, semantics and 
an alternative, local, semantics are in fact equivalent, and we then extended it 
in order to handle inconsistency. We also give an algorithm for query evaluation, 
and some results on special cases where queries can be evaluated more efficiently. 

Directions for future work include studying more thoroughly the complexity 
of query evaluation, as well as special cases, for example ones with appropriate 
network topologies, for which query evaluation is more tractable. Another issue 
is that of domain relations. These were introduced in [?] to capture the fact that 
different nodes in a p2p system may not use the same underlying domains, and 
show how to map one domain to another. Such relations arc not studied in the 
current paper, and their integration in our framework is another area for future 
research. 
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